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Cost of QCD simulations with n/ = 2 dynamical Wilson fermions 

Th. Lippert 

SESAM/TxL collaboration 

Department of Physics, University of Wuppertal, Germany 

Cost estimates for simulations of full QCD with nf = 2 Wilson fermions by hybrid Monte Carlo are presented. 
The extrapolations are based on the average number of iterations, Nn, of the iterative solver within the fermionic 
part of the HMC molecular dynamics, which is closely related to the minimal eigenvalue of M. The cost 
formula is determined as a product of the scaling functions of iterative solver and integrated autocorrelation time 
of 1/iVit as function of the inverse lattice pseudoscalar mass. Timings by SESAM/T^L allow to fix the pre-factor. 
It is demonstrated that a 2-flavor dynamical determination of light hadron masses with a statistical precision 
comparable to the corresponding quenched results from CP-PACS is the appropriate task for a 100 Tflops system. 



1. INTRODUCTION 

Realistic QCD simulations with dynamical 
fermions require to operate beyond the p decay 
threshold and closer to the continuum limit. High 
energy physics can profit from lattice QCD as 
soon as systematical and statistical errors are 
comparable to or smaller as experimental ones . 

In order to make best usage of the next gen- 
eration QCD machines concerning these physical 
goals, we should employ the most appropriate lat- 
tice actions and simulation algorithms. Hence, 
cost predictions are needed for different lattice 
discretizations and algorithms beyond = 0.5. 

As suggested by R. Kenway (Edinburgh)]], 
chairman of the Lattice 2001 panel discussion, we 
determine the costs of the SESAM and T%L ex- 
periments to estimate the costs of future hybrid 
Monte Carlo simulations with two degenerate fla- 
vors of standard Wilson fermions. 

2. SESAM/TxL SIMULATIONS 

SESAM/TxL has generated 10 ensembles of 
full QCD vacuum configurations with O(5000) 
HMC trajectories each, at = 5.6 and 5.5 in 
the region 0.57 < < 0.85 with two flavors of 
Wilson fermions. The lattice sizes are 16 3 x 32 
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(SESAM) and 24 3 x 40 (T%L), corresponding 
to physical sizes (from p mass) of 1.372(36) fm 
(SESAM) and of 1.902(34) fm (TyL) after chi- 
ral extrapolation. We used the standard hybrid 
Monte Carlo algorithm f§ , boosted by BiCGStab 
as solver ||, //-SSOR preconditioning j| and the 
educated guess procedure (chronological inver- 
sion method) ||. Running primarily on APE100 
systems at DESY/Zeuthen, DFG/Bielefeld and 
INFN/Rome, the total costs of the simulations 
amount to about 0.06 Tflops-yrs. 

In table @ we give important quantities from 
SESAM/TxL. We have exploited both o/e and 
SSOR preconditioned fermion actions [Q. The 
latter series are used for the cost analysis, as 
SSOR shows better scaling behavior. 

3. SCALING FITS 

Figure |l|a presents fits to the average number 
of iterations, N it , of the //-SSOR preconditioned 
BiCGStab solver, as function of l/m PS a. Note 
that the (3 = 5.6 result scales much better than 
(3 = 5.5. Presumably, smoother gauge fields allow 
for better //-SSOR preconditioning. 

Figure [3]b shows fits to the integrated autocor- 
relation times of the series of 1/N U , the inverse 
number of iterations. 1/A it is related to the min- 
imal eigenvalue of M^M. Its autocorrelation time 
is comparable to that of the topological charge. 
Again, we observe a strong (3 dependence. 
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Table 1 

Characteristic quantities from SESAM/T%L. The H-SSOR trajectories are indicated by # 
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Table 2 

Costs of simulations with 2 flavors of Wilson fermions in analogy to the quenched setting of CP-PACS. 
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Figure 1. Scaling of iterative solver and inte- 
grated autocorrelation time. 



4. COST FORMULA 

Comparing the sustained CPU time for one 
HMC trajectory on APE100 with the costs for 
the iterative solver allows to determine the nor- 
malization of the cost function. Multiplication 
with the autocorrelation time gives the effort to 
generate one statistically independent configura- 



tion at given as function of l/m PS a. We assume 
the volume to scale like (L/a) 5 (unlike (L/a) 4 ' 55 
of Ref . 0] ) and the temporal lattice extent to be 
twice as large as the spatial extent L. 



= 5.6 : A flopB = 2.3(7) • 10 7 • [ — 
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= 5.5: N naps = 1.6(4) -10 7 -^^ 



a m PS 
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5. EXTRAPOLATIONS 

The CP-PACS quenched simulations § 
achieved finite a results for light hadrons with 
errors < 1% and continuum result with errors 
between 1 and 3%. Linearly extrapolating (|l|) in 
a and I2£S - , we find the upper bounds to the CPU 
time (see table |2|) needed to carry out an analo- 
gous simulation with nf — 2 Wilson fermions. 
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